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Atfdmct — We define the highly available local leader elec- 
tion problem, a generalization of the leader election problem 
for part It lo nab le systems. We propose a protocol that solves 
the problem efficiently and give some performance measure- 
ments of our Implementation. The local leader election ser- 
vice has been proven useful In the design and Implementa- 
tion of several fall-aware services for part It lonable systems. 

KnytDarda — Local leader election, part It lonable systems, 
timed asynchronous systems, global leader election. 



L Introduction 

THE leads 1 election prohlsn [1] requires that a unique 
leader be elected from a given set of processes. The 
problem has been widely studied in the research community 
[2]. [3] : [4]. [5]. [fi]. One reason for this wide interest is 
that many distributed protocols need an election protocol 
as a sub-protocol. For example, in an atomic broadcast 
protocol the processes could dect a leader thai; orders the 
broadcasts so that all correct processes deliver broadcast 
messages in the same order. The highly available leader 
election problem was df£ned in [7] as follows: (5) at any 
point in time there exists at most one leader, and (T) when 
thse is no leader at time s. then within at most k time 
units a new leada* is elected. 

The highly available leader election service was first de- 
fined for synchronous systsns in which all correct processes 
axe connected, thai. is. can communicate with each other in 
a timely manner. Recently, the research in fault-tolerant 
systems has been investigating asynchronous partitionable 
systems [8] : [9] : i.e. distributed systems in which the set 
of processes can split in disjoint subsets due to network 
failures or excessive performance failures (i.e. processes or 
messages are not timely ; see Section III for details). like 
many other authors do. we call each such subset a parti- 
tion. For example, processes that run in different LANs 
can become partitioned when the bridge or the network 
thai; connects the LANs fails or is "too slow 19 (see Figure 
4). One reason for the research in partitionable systems 
is that the "primary partition 19 approaches [10] allow only 
the processes in one partition to mate progress. Tb in- 
crease the availability of services, one often wants services 
to make progress in all partitions. 

Our recent design of a membership [ll] and a clock syn- 
chronization service for partitionable systems [12] has indi- 
cated that we need a leader election service with different 



properties for partitionable systems than for synchros 
systems. The first problem that we encountered is hov 
specify the requirements of such a local leader election 
vice. Ideally, such a service should elect exactly one 1< 
leads- in each partition. Howevs-. it is not always pc 
hie to elect a leader in each partition. For example, w 
the processes in a partition suffer excessive psforma 
failures, one cannot enforce that there exists exactly 
local leads- in that partition. Tb approach this probl 
we have to define in what partitions local leaders havi 
be elected: we introduce therefore the notion of a si 
partition. Informally, all processes in a stable partition 
connected to each other, i.e. any two processes in a ; 
hie partition can communicate with each other in a tin 
manns. The processes in a stable partition are requirei 
elect a local leads within a bounded amount of time, 
election ss-vice might be able to elect a local leads in 
unstable partition, i.e. a partition that is not stable, bt 
is not guaranteed that there will be a local leader in f 
unstable partition. We call a process "unstable 1 * when : 
part of an unstable partition. 

In each stable partition, a local leads election ser 
has to elect exactly one local leads-. In an unstable ] 
titaon the ssvice might not be able to elect exactly 
local leads:. It can be advantageous to split an unsti 
partition into two or more "logical partitions 19 with 
local leader each if that enables the processes in eacl 
these logical partitions to communicate with each othe 
a timely manner (see Figure 1). Tb explain this, note t 
our definition of a "stable partition 19 will require that 
processes in such a partition be connected to each ot 
This implies that whsi the connected relation in a partil 
is not transitive, that partition is unstable. For exam 
the connected relation can become non-transitive for tl 
processes {jx q. r} if the network link between p and r I 
or is ovsdoaded while the links betwesi p and q and q . 
r stay correct (see Figure 2). 

In specific circumstances, our local leader ssvice sj 
an unstable partition into two or more logical partiti 
with one leader in each. The service makes sure th* 
timely communication hdiweeu any two processes in a '. 
ical partition is possible. However, sometimes this com] 
ideation has to go via the local leads- in case two proce 
p and v in a logical partition are only connected thra 
the local leader q (see Figure 2.b). Informally, a log 
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